Lab 01

Learning objectives

By the end of the lab, you will be able to …

  • setup a reproducible workflow using R and RStudio
  • familiarize yourself with a dataset using R and RStudio
  • create a reproducible report using Quarto

Learning to Code

Technology is fun!


You’re not just learning the statistical concepts in this course, but how to produce the statistics. Analyzing data requires learning to use new technology.


Learning statistical software to analyze data can be really fun. You get to learn about real world social problems!

Technology is challenging!

It can be frustrating.


When it feels like the technology is preventing you from getting to the course content, take a deep breath, and remember that building your technology skills is part of this course.

Why am I making you learn something so frustrating?


Calculating the statistics by hand quickly gets cumbersome, time consuming, and difficult.


Good social science is built on replication.

Grappling

Learning to use statistical software necessitates grappling.

Grappling implies trying even before you fail the first time.


It’s thinking, “First, I’ll work with it independently. Okay, I’m really not understanding it. Let me go back to my notes. Okay, I have solved for the first part of it. Now I have the second part of it. Okay, I got the question wrong; let me try again. Maybe I can ask my peer now.”


Grappling is working hard to make sure you understand the problem fully, and then using every resource at your fingertips to solve it.”

Most statistical analyses happen not because the person is a math genius, but because they persisted through the minefield of technical issues by being excellent problem-solvers.

Coding is mostly Googling


It is a misconception that the best statistical analysts sit down at their computers and type code from memory.


Much of process of coding is copying code from somewhere else and modifying it to fit your particular situation.

When you get stuck…

…there are many options to get unstuck:

  • Review the slides. Pay very close attention to small details.
  • Try something else to see if you get a new error.
  • Use Google to search for possible answers or new explanations.
  • Watch a help video on YouTube on the topic.
  • Re-start your web-browser or device.
  • Try another web-browser or device.
  • Ask a peer. Or an advanced student.
  • Start or join a weekly study group.
  • Post the question on the class discussion board.
  • Email your TA

Help in this class

Before requesting an individual meeting with a TA:

  • Spend a sufficient amount of time working on it on your own.
  • Ask two of your peers.
  • Post the question on the class discussion board.

When emailing:

  • Explain what troubleshooting steps you’ve already taken. 
  • Report who you’ve already asked for help. 

Create a trail!

Create a reproducible example


Goal: Make someone else feel your pain!

  • Assume others know nothing about your issue. 
  • Describe your steps to create the problem so that someone else can replicate it. 
  • This means clearly describing the issue and the steps you’ve already taken to solve it. 

Good etiquette

Search for answers before posting your question.
Let me google that for you. 🙄 

Describe the problem.
“It doesn’t work” isn’t descriptive enough. 

Describe your environment.
What operating system are you using? Which R version? What packages? Dataset?

Describe the solution.
Confirm if a solution offered works. Or, if you solve it on your own, post how you solved it.

R & RStudio Workflow

Replication

The guiding principle for workflow.

A workflow of data analysis is a process for managing all aspects of data analysis.


Planning, documenting, and organizing your work; cleaning the data; creating, renaming, and verifying variables; performing and presenting statistical analyses; producing replicable results; and archiving what you have done are all integral parts of your workflow.

Steps in a workflow

Set up Systematic organization of the project and project files.
Familiarize self with data Skipping takes more time in the long run.
Process data Takes the MOST time.
Running analyses What people THINK takes the most time.
Presenting results What people (wrongly) think does not take time.

File types

There are many file types, but these are key to an R & RStudio workflow (and likely new to you):

Extension Description
.Rproj RStudio project file (keeps project settings).
.R R scripts store a sequence of R commands (code) that can be run all at once or line by line.
.qmd Quarto Markdown creates reproducible documents that contain a combination of text, code, and output.
.Rdata (or sometimes .rda) These store and load R objects—like data frames.

File names

should be:

  • machine-readable
  • human-readable
  • play well with default-ordering

RStudio Projects


Create a RStudio project for each data analysis project.

It supports an organized and reproducible workflow, cleanly separated from all other projects that you are working on. Everything you need in one place:

  • local data files to load into RStudio.
  • scripts to edit or run in bits or as a whole.
  • Save your outputs (plots and cleaned data).

Filepaths

Adopting a project-based workflow avoids changing file paths.


ABSOLUTE FILE PATHS

Department of Sociology
Unit 17100, 17th Floor, Ontario Power Building
700 University Ave., Toronto, ON M5G 1Z5

C:\Users\Pepin\GitHub\SOC6302\scripts

RELATIVE FILE PATHS

Take the left side elevators to the 17th floor.
Go through the double doors and a take a right.
First door on your left.

here(scripts)

Panes

There are four key regions or “panes” in the interface:

  1. Source pane: where you can edit and save R scripts or author computational documents like Quarto and R Markdown.

  2. Console pane: is used to write short interactive R commands.

  3. Environment pane: displays temporary R objects created during that R session.

  4. Output pane: displays the plots, tables, or HTML outputs of executed code along with files saved to disk.

Source Pane

The top-left panel and can be launched by opening any editable file in RStudio.

Quarto

Quarto: The tool you’ll use to create reproducible computational documents. Every piece of assignment you hand in will be a Quarto document.

Note

You are likely familiar with word processors like MS Word or Google Docs. We will not be using these in this class. Instead, the words you would write in such a document, as well as your R code, will go into a Quarto document. You will render the document (more on what this means later) to get a document out that has your words, code, and the output of that code. Everything in one place, beautifully formatted!

RScript

great for learning, exploring and tinkering.

rerun it without attention to formatting or markdown.

Quarto

great for communicating analysis and results

combines narrative explanation with code output (results.

Documentation

Blank slate

Clear the memory at every restart of RStudio by turning off the automatic saving of your workspace and .Rdata files with you quit RStudio. This is important for reproducibility, debugging, and avoiding littering your computer with unnecessary files.

Set this via:

  1. Tools > Global Options.
  2. Uncheck “Restore .RData into Workspace at Startup”.
  3. Choose “Never” on the “Save workspace to .RData on exit”.
  4. Click “Apply” and “OK”.

Comprehensive R Archive Network (CRAN)

CRAN is like an App Store for R. It hosts R packages, documentation, and source code contributed by users worldwide. It is mediated (e.g., quality controlled), making it incredibly reliable.

R users can easily install, update, and share R packages using install.packages().

Packages

R comes with basic tools, but packages extend the capabilities of base R (what you already installed). An R package is like a toolbox: a collection of functions, data, and documentation that help you do specific tasks using R.


You’ll install each package (only once per system):

install.packages("tidyverse")


You’ll load each package (every time you use it):

library(tidyverse)

Support

Some help videos and further explanation:

R - Intro RStudio Interface

EasyR - Getting started with R the easy way

Getting Started

Your first code-along

Download and open code-along-01

Create a RStudio Project


To create a new project in RStudio, click: File > New Project.

In the New Project wizard that pops up, select: New Directory, then New Project.

Name the project “SOC6302” and click: Create Project.

This will launch you into a new RStudio Project inside a new folder called “SOC6302”.

R-script

Open RStudio, then click the dropdown arrow next to the “New File icon,” and then “R script.”

Packages

We’ll use the following packages:

  • here() (relative file paths)
  • tidyverse() (data wrangling)
  • gssr() (U.S. General Social Survey data)
  • gssrdoc() (GSS documentation)

Install here() and tidyverse()

Let’s install the two packages that are available on CRAN.


Copy and paste the following code into your Console pane. Then hit enter.

install.packages("here")


Then, do the same to install the tidyverse package.

install.packages("tidyverse")

Install gssr() and gssrdoc()

# Install 'gssr' from 'ropensci' universe
install.packages('gssr', repos =
  c('https://kjhealy.r-universe.dev', 'https://cloud.r-project.org'))

# Also recommended: install 'gssrdoc' as well
install.packages('gssrdoc', repos =
  c('https://kjhealy.r-universe.dev', 'https://cloud.r-project.org'))

Load the packages

library(here)
library(tidyverse)
library(gssr)
library(gssrdoc)

Environment

# software documentation
sessionInfo()
R version 4.5.1 (2025-06-13 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)

Matrix products: default
  LAPACK version 3.12.1

locale:
[1] LC_COLLATE=English_Canada.utf8  LC_CTYPE=English_Canada.utf8   
[3] LC_MONETARY=English_Canada.utf8 LC_NUMERIC=C                   
[5] LC_TIME=English_Canada.utf8    

time zone: America/Toronto
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] gssrdoc_0.7.0      here_1.0.1         conflicted_1.2.0   summarytools_1.1.4
 [5] flextable_0.9.6    kableExtra_1.4.0   labelled_2.13.0    haven_2.5.4       
 [9] gssr_0.7           lubridate_1.9.3    forcats_1.0.0      stringr_1.5.1     
[13] dplyr_1.1.4        purrr_1.0.4        readr_2.1.5        tidyr_1.3.1       
[17] tibble_3.2.1       ggplot2_3.5.1      tidyverse_2.0.0   

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.1        viridisLite_0.4.2       fastmap_1.2.0          
 [4] fontquiver_0.2.1        pacman_0.5.1            promises_1.3.3         
 [7] digest_0.6.37           timechange_0.3.0        mime_0.13              
[10] lifecycle_1.0.4         gfonts_0.2.0            magrittr_2.0.3         
[13] compiler_4.5.1          rlang_1.1.6             tools_4.5.1            
[16] utf8_1.2.4              yaml_2.3.10             data.table_1.15.4      
[19] knitr_1.50              askpass_1.2.0           curl_5.2.1             
[22] plyr_1.8.9              xml2_1.3.6              httpcode_0.3.0         
[25] withr_3.0.2             grid_4.5.1              fansi_1.0.6            
[28] gdtools_0.3.7           xtable_1.8-4            colorspace_2.1-0       
[31] scales_1.3.0            MASS_7.3-65             crul_1.4.2             
[34] cli_3.6.5               rmarkdown_2.29          crayon_1.5.3           
[37] ragg_1.3.2              generics_0.1.3          rstudioapi_0.17.1      
[40] reshape2_1.4.4          tzdb_0.4.0              cachem_1.1.0           
[43] pander_0.6.5            matrixStats_1.3.0       base64enc_0.1-3        
[46] vctrs_0.6.5             jsonlite_2.0.0          fontBitstreamVera_0.1.1
[49] hms_1.1.3               rapportools_1.2         systemfonts_1.1.0      
[52] magick_2.8.7            glue_1.8.0              codetools_0.2-20       
[55] stringi_1.8.4           gtable_0.3.5            later_1.4.2            
[58] munsell_0.5.1           pillar_1.9.0            htmltools_0.5.8.1      
[61] openssl_2.2.0           R6_2.6.1                tcltk_4.5.1            
[64] textshaping_0.4.0       rprojroot_2.0.4         evaluate_1.0.4         
[67] shiny_1.11.0            backports_1.5.0         memoise_2.0.1          
[70] fontLiberation_0.1.0    httpuv_1.6.16           pryr_0.1.6             
[73] Rcpp_1.0.14             zip_2.3.1               uuid_1.2-0             
[76] svglite_2.1.3           checkmate_2.3.2         officer_0.6.6          
[79] xfun_0.52               fs_1.6.6                pkgconfig_2.0.3        

Project Structure

Let’s set up your project structure using the here() package.

here()

First, let’s just establish our project directory

# set the file path to the root of the project
here()
[1] "C:/Users/Joanna/Documents/GitHub/Stats-for-Sociologists"

Example Folder Structure

Research Projects

project/
data/
gss7924-raw.rda
gss7924-processed.Rdata/
scripts/
clean_data.R
analyze_data.R
draft.qmd
outputs/
draft.html
figures/
plot1.png
plot2.png
readme.qmd
project.Rproj

SOC6302

SOC6302/
data/
gss7924-raw.rda
gss7924-processed.Rdata/
code-alongs/
milestones/
project/
data/
scripts/
outputs/
readme.qmd
SOC6302.Rproj

Create a Folder Structure

with here() and dir.create()

# Create base folders
dir.create(here("data"), recursive = TRUE)
dir.create(here("code-alongs"), recursive = TRUE)
dir.create(here("milestones"), recursive = TRUE)
dir.create(here("project"), recursive = TRUE)

Create Sub-folders

with here() and dir.create()

# Create project sub-folders
dir.create(here("project", "data"), recursive = TRUE)
dir.create(here("project", "scripts"), recursive = TRUE)
dir.create(here("project", "outputs"), recursive = TRUE)

Check your work

# Your SOC6302 class folder
list.files(path = here())
 [1] "_extensions"                  "_quarto-speaker.yml"         
 [3] "_quarto.yml"                  "code-alongs"                 
 [5] "data"                         "docs"                        
 [7] "labs"                         "lectures"                    
 [9] "mile-stones"                  "SOC6302_syllabus.qmd"        
[11] "Stats for Sociologists.Rproj"
# Your "Project" sub-folder
list.files(path = here("project"))
character(0)

YAML

YAML Ain’t Markup Language